Super-Scalar Database Compression between RAM and CPU Cache
نویسندگان
چکیده
Data-intensive query processing tasks like data mining, scientific data analysis, and decision support can leave a database system severely I/O-bound, even when common RAID configurations are used. Traditionally, this problem has been tackled by adding more and more disks, connected through expensive interconnect networks. This brute-force approach results in systems of which the price is dominated by the cost of their disk subsystems and a lot of disk space is wasted as disks are only added to gain bandwidth. A more subtle and cost-effective solution can be found in data compression, which has the potential to alleviate the I/O bottleneck. However, traditional algorithms like Huffman coding, Arithmetic coding and Lempel-Ziv style dictionary methods are not suited for this goal due to high processing overheads. In order to be of practical value, even on common RAID configurations, decompression algorithms should be capable of producing roughly one byte per CPU cycle on modern hardware, or around three gigabytes per second. To achieve this, a case is made for novel, light-weight compression algorithms, which exploit both the structure of the underlying database and the characteristics of modern CPUs. In general, performance is preferred over compression ratio, and algorithms should strive to extract maximum instructions-perclock-cycle (IPC) from modern CPUs. Furthermore, to rule out main memory bottlenecks, candidate algorithms should allow for incremental, into-cache decompression. This work introduces three novel compression schemes (PFOR, PFOR-DELTA, and PDICT) that are designed towards these goals. Experimental results show that these methods can significantly alleviate I/O bottlenecks, thereby effectively increasing the performance of todays hierarchical memory based systems.
منابع مشابه
MonetDB/X100 - A DBMS In The CPU Cache
X100 is a new execution engine for the MonetDB system, that improves execution speed and overcomes its main memory limitation. It introduces the concept of in-cache vectorized processing that strikes a balance between the existing column-at-a-time MIL execution primitives of MonetDB and the tuple-at-a-time Volcano pipelining model, avoiding their drawbacks: intermediate result materialization a...
متن کاملEfficient and Flexible Information Retrieval using MonetDB/X100
Today’s large-scale IR systems are not implemented using general-purpose database systems, as the latter tend to be significantly less efficient than custom-built IR engines. This paper demonstrates how recent developments in hardwareconscious database architecture may however satisfy IR needs. The advantage is flexibility of experimentation, as implementing a retrieval system on top of a DBMS ...
متن کاملCache Conscious Algorithms for Relational Query Processing
The current main memory (DRAM) access speeds lag far behind CPU speeds. Cache memory, made of static RAM, is being used in today’s architectures to bridge this gap. It provides access latencies of 2-4 processor cycles, in contrast to main memory which requires 15-25 cycles. Therefore, the performance of the CPU depends upon how well the cache can be utilized. We show that there are significant ...
متن کاملNew model for arithmetic coding/decoding of multilevel images based on a cache memory
In this work we present new methodologies for arithmetic encoding and decoding of mul-tilevel images, achieving important improvements in cycle length and reducing complexity. Entropy coding methods should carry out operations of maintenance and search in tables, the size of which depends on the number of symbols of the alphabet. In this work we reduce the size of the table by introducing a new...
متن کاملComparison of Energy and Performance Efficiency of Recent Cache Configuration Trends and Designs
Over the past ten years cache configurations and their designs have increased in importance, energy efficiency, speed, and in other aspects as well. The goal of this paper is to discuss the energy and performance efficiency of different cache designs throughout the years. A good scale efficiency for a cache could be considered a measure of its latency and energy consumption. To optimize cache e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005